Support eventdb to record reported alarms#5
Conversation
Improvements:
|
|
@jjin62 For the description -> type-id mirror, there is a concern over here. If we do this, it will drop the severity field automatically since they are using the same type-id's severity. Attached my experiment:
I tried to raise a CRITICAL alarm, but finally it used |
|
7c6e83b to
6e94484
Compare
|
Try to use sonic-event.yang to generate command lines. Right now they are working. https://github.com/sonic-molex/sonic-swss only supports events nor alarms.(Out of Gain Range is not a good demo case. Eventdb will treat event with action syslog as an alarm.)
|
Support sonic-event.yang CLI generation in yang_auto_cli.sh
6e94484 to
ad1948e
Compare
Open questions:
|
* Support eventdb to record reported alarms * Add description in alarmDB record Support sonic-event.yang CLI generation in yang_auto_cli.sh
* Support eventdb to record reported alarms * Add description in alarmDB record Support sonic-event.yang CLI generation in yang_auto_cli.sh
…net#25643) * [build] Add build timing report and dependency analysis tools Add three scripts for build performance instrumentation: - scripts/build-timing-report.sh: Parse per-package timing from build logs (HEADER/FOOTER timestamps), generate sorted duration table, phase breakdown, parallelism timeline, and CSV export. - scripts/build-dep-graph.py: Parse rules/*.mk dependency graph, compute critical path, fan-out/fan-in bottleneck analysis, and generate DOT/JSON output for visualization. - scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O, and Docker container count during builds for resource utilization analysis. Add "make build-report" target to slave.mk that runs the timing report and dependency analysis after a build completes. Example output from a VS build on 24-core/30GB machine: - 210 packages built in 53m wall time (173m CPU) - Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4) - Critical path: 14 packages deep (libnl -> libswsscommon -> utilities) - Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> * Address Copilot review: fix 17 bugs in build analysis scripts - Use free -m with division instead of free -g to avoid rounding (#1) - Add = and ?= to Makefile dependency regex patterns (#2, sonic-otn#7) - CPU calculation now uses /proc/stat delta (two reads) (#3, sonic-otn#14) - Fix misleading 'critical path estimate' comment (#4) - Fix parallelism timeline comment (60s not 10s) (#5) - Include after-relationship packages in fan stats (#6) - Guard disk I/O division by zero when INTERVAL<=1 (sonic-otn#8) - Remove unused elapsed_line variable (sonic-otn#9) - Remove redundant LIBSWSSCOMMON_DBG check (sonic-otn#10) - Remove active_make_jobs from CSV header comment (sonic-otn#11) - Wire up _RDEPENDS parsing to build reverse deps (sonic-otn#12) - Remove unnecessary 'if v' filter on rdeps JSON (sonic-otn#13) - Remove unused REPORT_FORMAT parameter (sonic-otn#15) - Add cycle detection to critical path algorithm (sonic-otn#16) - Add execute permission check for companion scripts (sonic-otn#17) Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> --------- Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
…dating udevd rules (sonic-net#26343) - Why I did it On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs. This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase. Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #4 0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #5 0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #6 0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-otn#7 0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-otn#8 0x0000559f295519cf in ?? () sonic-otn#9 0x0000559f29553a77 in ?? () sonic-otn#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-otn#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-otn#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-otn#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-otn#14 0x0000559f29545820 in ?? () sonic-otn#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 sonic-otn#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 sonic-otn#17 0x0000559f29545c51 in ?? () - How I did it Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op. - How to verify it Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step Reboot the switch Verify no new systemd-udevd coredumps in /var/core/ Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID ) Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running) Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Why I did it
Work item tracking
How I did it
sonic-buildimage/device/molex/x86_64-otn-kvm_x86_64-r0/default.jsonsonic-eventd-otn-profiledebian package to apply device level alarm list.sonic-eventd-otn-profilewill copy ONIE_PLATFORM alarm list and syslog plugin to eventd.Architecture
Bug fix and changes